[feat] add VisuLogic #234

xwy-bit · 2025-04-23T16:01:14Z

VisuLogic provides a benchmark and training dataset to evaluate and enhance MLLMs' visual reasoning.

📖 Introduction

VisuLogic is a newly designed benchmark aimed at evaluating the visual reasoning capabilities of Multi-modal Large Language Models (MLLMs), independent of textual reasoning processes. It features carefully constructed visual reasoning tasks spanning multiple categories, divided into six types based on required reasoning skills (e.g., Quantitative Reasoning, which involves understanding and deducing changes in the quantity of elements in images). Unlike existing benchmarks, VisuLogic is a challenging visual reasoning benchmark that is inherently difficult to articulate using language, providing a more rigorous evaluation of the visual reasoning capabilities of MLLMs. Most models score below 30% accuracy—only slightly above the 25% random baseline and far below the 51.4% achieved by humans—revealing significant gaps in visual reasoning.

🌟 Key Features

🚀 Visuo-Logical Challenge
The first benchmark to integrate visual perception with logical reasoning, enabling authentic multimodal evaluation. Most models score below 30% accuracy—only slightly above the 25% random baseline and far below the 51.4% achieved by humans—revealing significant gaps in visual reasoning.
🛠️ Rigorous Design
Includes 1,000 meticulously curated questions, spanning 6 domains and 23 subcategories, for comprehensive performance evaluation.
📝 Anti-Linguistic Shortcut
Designed to avoid linguistic reasoning, ensuring tasks rely on genuine visual reasoning rather than shortcuts.
💡 RL Exploration
We identify the RL technique as a promising direction for improving the visual reasoning capabilities of MLLMs. Through RL method, models reach SOTA in VisuLogic!
✅ Fully Open-source
We open-source all the evaluation code, training scripts, and datasets associated with this work to promote further research and innovation.

VisuLogic provides a benchmark and training dataset to evaluate and enhance MLLMs' visual reasoning.

[feat] add VisuLogic

37b7837

VisuLogic provides a benchmark and training dataset to evaluate and enhance MLLMs' visual reasoning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] add VisuLogic #234

[feat] add VisuLogic #234

Uh oh!

xwy-bit commented Apr 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[feat] add VisuLogic #234

Are you sure you want to change the base?

[feat] add VisuLogic #234

Uh oh!

Conversation

xwy-bit commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

VisuLogic provides a benchmark and training dataset to evaluate and enhance MLLMs' visual reasoning.

📖 Introduction

🌟 Key Features

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xwy-bit commented Apr 23, 2025 •

edited

Loading